Variants of SGD for Lipschitz Continuous Loss Functions in Low-Precision Environments. (arXiv:2211.04655v2 [math.OC] UPDATED)
Motivated by neural network training in low-bit floating and fixed-point
environments, this work studies the convergence of variants of SGD with
computational error. Considering a general stochastic Lipschitz continuous loss
function, a novel convergence result to a Clarke stationary point is presented
assuming that only an approximation of its stochastic gradient can be computed
as well as error in computing the SGD step itself. Different variants of SGD
are then tested empirically in a variety of low-precision arithmetic
environments, where improved test set accuracy is observed compared to SGD for
two image recognition tasks.